9 research outputs found

    Valentine: Evaluating Matching Techniques for Dataset Discovery

    Full text link
    Data scientists today search large data lakes to discover and integrate datasets. In order to bring together disparate data sources, dataset discovery methods rely on some form of schema matching: the process of establishing correspondences between datasets. Traditionally, schema matching has been used to find matching pairs of columns between a source and a target schema. However, the use of schema matching in dataset discovery methods differs from its original use. Nowadays schema matching serves as a building block for indicating and ranking inter-dataset relationships. Surprisingly, although a discovery method's success relies highly on the quality of the underlying matching algorithms, the latest discovery methods employ existing schema matching algorithms in an ad-hoc fashion due to the lack of openly-available datasets with ground truth, reference method implementations, and evaluation metrics. In this paper, we aim to rectify the problem of evaluating the effectiveness and efficiency of schema matching methods for the specific needs of dataset discovery. To this end, we propose Valentine, an extensible open-source experiment suite to execute and organize large-scale automated matching experiments on tabular data. Valentine includes implementations of seminal schema matching methods that we either implemented from scratch (due to absence of open source code) or imported from open repositories. The contributions of Valentine are: i) the definition of four schema matching scenarios as encountered in dataset discovery methods, ii) a principled dataset fabrication process tailored to the scope of dataset discovery methods and iii) the most comprehensive evaluation of schema matching techniques to date, offering insight on the strengths and weaknesses of existing techniques, that can serve as a guide for employing schema matching in future dataset discovery methods

    Overfeeding Reduces Insulin Sensitivity and Increases Oxidative Stress, without Altering Markers of Mitochondrial Content and Function in Humans

    Get PDF
    BACKGROUND: Mitochondrial dysfunction and increased oxidative stress are associated with obesity and type 2 diabetes. High fat feeding induces insulin resistance and increases skeletal muscle oxidative stress in rodents, but there is controversy as to whether skeletal muscle mitochondrial biogenesis and function is altered. METHODOLOGY AND PRINCIPAL FINDINGS: Forty (37±2 y) non-obese (25.6±0.6 kg/m2) sedentary men (n = 20) and women (n = 20) were overfed (+1040±100 kcal/day, 46±1% of energy from fat) for 28 days. Hyperinsulinemic-euglycemic clamps were performed at baseline and day 28 of overfeeding and skeletal muscle biopsies taken at baseline, day 3 and day 28 of overfeeding in a sub cohort of 26 individuals (13 men and 13 women) that consented to having all 3 biopsies performed. Weight increased on average in the whole cohort by 0.6±0.1 and 2.7±0.3 kg at days 3 and 28, respectively (P<0.0001, without a significant difference in the response between men and women (P = 0.4). Glucose infusion rate during the hyperinsulinemic-euglycemic clamp decreased from 54.8±2.8 at baseline to 50.3±2.5 mmol/min/kg FFM at day 28 of overfeeding (P = 0.03) without a significant difference between men and women (P = 0.4). Skeletal muscle protein carbonyls and urinary F2-isoprostanes increased with overfeeding (P,<.05). Protein levels of muscle peroxisome proliferator-activated receptor gamma coactivator-1a (PGC1a) and subunits from complex I, II and V of the electron transport chain were increased at day 3 (all P<0.05) and returned to basal levels at day 28. No changes were detected in muscle citrate synthase activity or ex vivo CO2 production at either time point. CONCLUSIONS: Peripheral insulin resistance was induced by overfeeding, without reducing any of the markers of mitochondrial content that were examined. Oxidative stress was however increased, and may have contributed to the reduction in insulin sensitivity observed.Dorit Samocha-Bonet, Lesley V. Campbell, Trevor A. Mori, Kevin D. Croft, Jerry R. Greenfield, Nigel Turner and Leonie K. Heilbron

    Valentine: Evaluating Matching Techniques for Dataset Discovery

    No full text
    Data scientists today search large data lakes to discover and integrate datasets. In order to bring together disparate data sources, dataset discovery methods rely on some form of schema matching: the process of establishing correspondences between datasets. Traditionally, schema matching has been used to find matching pairs of columns between a source and a target schema. However, the use of schema matching in dataset discovery methods differs from its original use. Nowadays schema matching serves as a building block for indicating and ranking inter-dataset relationships. Surprisingly, although a discovery method's success relies highly on the quality of the underlying matching algorithms, the latest discovery methods employ existing schema matching algorithms in an ad-hoc fashion due to the lack of openly-available datasets with ground truth, reference method implementations, and evaluation metrics. In this paper, we aim to rectify the problem of evaluating the effectiveness and efficiency of schema matching methods for the specific needs of dataset discovery. To this end, we propose Valentine, an extensible open-source experiment suite to execute and organize large-scale automated matching experiments on tabular data. Valentine includes implementations of seminal schema matching methods that we either implemented from scratch (due to absence of open source code) or imported from open repositories. The contributions of Valentine are: i) the definition of four schema matching scenarios as encountered in dataset discovery methods, ii) a principled dataset fabrication process tailored to the scope of dataset discovery methods and iii) the most comprehensive evaluation of schema matching techniques to date, offering insight on the strengths and weaknesses of existing techniques, that can serve as a guide for employing schema matching in future dataset discovery methods

    Valentine: Evaluating Matching Techniques for Dataset Discovery

    No full text
    Data scientists today search large data lakes to discover and integrate datasets. In order to bring together disparate data sources, dataset discovery methods rely on some form of schema matching: the process of establishing correspondences between datasets. Traditionally, schema matching has been used to find matching pairs of columns between a source and a target schema. However, the use of schema matching in dataset discovery methods differs from its original use. Nowadays schema matching serves as a building block for indicating and ranking inter-dataset relationships. Surprisingly, although a discovery method's success relies highly on the quality of the underlying matching algorithms, the latest discovery methods employ existing schema matching algorithms in an ad-hoc fashion due to the lack of openly-available datasets with ground truth, reference method implementations, and evaluation metrics. In this paper, we aim to rectify the problem of evaluating the effectiveness and efficiency of schema matching methods for the specific needs of dataset discovery. To this end, we propose Valentine, an extensible open-source experiment suite to execute and organize large-scale automated matching experiments on tabular data. Valentine includes implementations of seminal schema matching methods that we either implemented from scratch (due to absence of open source code) or imported from open repositories. The contributions of Valentine are: i) the definition of four schema matching scenarios as encountered in dataset discovery methods, ii) a principled dataset fabrication process tailored to the scope of dataset discovery methods and iii) the most comprehensive evaluation of schema matching techniques to date, offering insight on the strengths and weaknesses of existing techniques, that can serve as a guide for employing schema matching in future dataset discovery methods

    Valentine: Evaluating Matching Techniques for Dataset Discovery

    No full text
    Data scientists today search large data lakes to discover and integrate datasets. In order to bring together disparate data sources, dataset discovery methods rely on some form of schema matching: the process of establishing correspondences between datasets. Traditionally, schema matching has been used to find matching pairs of columns between a source and a target schema. However, the use of schema matching in dataset discovery methods differs from its original use. Nowadays schema matching serves as a building block for indicating and ranking inter-dataset relationships. Surprisingly, although a discovery method's success relies highly on the quality of the underlying matching algorithms, the latest discovery methods employ existing schema matching algorithms in an ad-hoc fashion due to the lack of openly-available datasets with ground truth, reference method implementations, and evaluation metrics. In this paper, we aim to rectify the problem of evaluating the effectiveness and efficiency of schema matching methods for the specific needs of dataset discovery. To this end, we propose Valentine, an extensible open-source experiment suite to execute and organize large-scale automated matching experiments on tabular data. Valentine includes implementations of seminal schema matching methods that we either implemented from scratch (due to absence of open source code) or imported from open repositories. The contributions of Valentine are: i) the definition of four schema matching scenarios as encountered in dataset discovery methods, ii) a principled dataset fabrication process tailored to the scope of dataset discovery methods and iii) the most comprehensive evaluation of schema matching techniques to date, offering insight on the strengths and weaknesses of existing techniques, that can serve as a guide for employing schema matching in future dataset discovery methods

    Valentine: Evaluating Matching Techniques for Dataset Discovery

    No full text
    Data scientists today search large data lakes to discover and integrate datasets. In order to bring together disparate data sources, dataset discovery methods rely on some form of schema matching: the process of establishing correspondences between datasets. Traditionally, schema matching has been used to find matching pairs of columns between a source and a target schema. However, the use of schema matching in dataset discovery methods differs from its original use. Nowadays schema matching serves as a building block for indicating and ranking inter-dataset relationships. Surprisingly, although a discovery method's success relies highly on the quality of the underlying matching algorithms, the latest discovery methods employ existing schema matching algorithms in an ad-hoc fashion due to the lack of openly-available datasets with ground truth, reference method implementations, and evaluation metrics. In this paper, we aim to rectify the problem of evaluating the effectiveness and efficiency of schema matching methods for the specific needs of dataset discovery. To this end, we propose Valentine, an extensible open-source experiment suite to execute and organize large-scale automated matching experiments on tabular data. Valentine includes implementations of seminal schema matching methods that we either implemented from scratch (due to absence of open source code) or imported from open repositories. The contributions of Valentine are: i) the definition of four schema matching scenarios as encountered in dataset discovery methods, ii) a principled dataset fabrication process tailored to the scope of dataset discovery methods and iii) the most comprehensive evaluation of schema matching techniques to date, offering insight on the strengths and weaknesses of existing techniques, that can serve as a guide for employing schema matching in future dataset discovery methods

    Factors Affecting Cloud Infra-Service Development Lead Times: A Case Study at ING

    No full text
    The development of Cloud Infra-Services has shifted over the past decade in the direction of a software code development process, also known as infrastructure as code (IaC). Contemporary continuous delivery settings in industry require fast feedback. As a consequence, companies need insight in time spent, especially in the development of such services. We examine a series of 28 Cloud Infra-Services within ING, and explore which factors affect their overall time to market and development time. An initial perception among several stakeholders in the Cloud Infra-Service development process, that Cloud Infra-Services within ING take longer than those in peer companies, is not confirmed by our benchmark. Development team members identified the time to internal market of services to be affected negatively by the portal where consumers can order a service and the Orchestration Workflows and by team dynamics. This perception is supported by additional metrics. We propose that promising ways to reduce lead time include reducing the complexity of the ING environment, by treating Cloud Infra-Services like regular software deliveries and by reducing the dependencies between teams in terms of tooling and collaboration.Software EngineeringComputer Science & Engineering-Teaching TeamSoftware Technolog
    corecore